Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Sci Rep ; 14(1): 7831, 2024 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570569

RESUMO

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Feminino , Humanos , Lactente , Software , Registros Eletrônicos de Saúde , Mães
2.
JAMIA Open ; 6(4): ooad085, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37799347

RESUMO

Objectives: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). Materials and Methods: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score. Results: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric. Discussion: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system. Conclusion: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.

3.
Front Artif Intell ; 6: 1229609, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37693012

RESUMO

Purpose: Between 30 and 68% of patients prematurely discontinue their antidepressant treatment, posing significant risks to patient safety and healthcare outcomes. Online healthcare forums have the potential to offer a rich and unique source of data, revealing dimensions of antidepressant discontinuation that may not be captured by conventional data sources. Methods: We analyzed 891 patient narratives from the online healthcare forum, "askapatient.com," utilizing content analysis to create PsyRisk-a corpus highlighting the risk factors associated with antidepressant discontinuation. Leveraging PsyRisk, alongside PsyTAR [a publicly available corpus of adverse drug reactions (ADRs) related to antidepressants], we developed a machine learning-driven algorithm for proactive identification of patients at risk of abrupt antidepressant discontinuation. Results: From the analyzed 891 patients, 232 reported antidepressant discontinuation. Among these patients, 92% experienced ADRs, and 72% found these reactions distressful, negatively affecting their daily activities. Approximately 26% of patients perceived the antidepressants as ineffective. Most reported ADRs were physiological (61%, 411/673), followed by cognitive (30%, 197/673), and psychological (28%, 188/673) ADRs. In our study, we employed a nested cross-validation strategy with an outer 5-fold cross-validation for model selection, and an inner 5-fold cross-validation for hyperparameter tuning. The performance of our risk identification algorithm, as assessed through this robust validation technique, yielded an AUC-ROC of 90.77 and an F1-score of 83.33. The most significant contributors to abrupt discontinuation were high perceived distress from ADRs and perceived ineffectiveness of the antidepressants. Conclusion: The risk factors identified and the risk identification algorithm developed in this study have substantial potential for clinical application. They could assist healthcare professionals in identifying and managing patients with depression who are at risk of prematurely discontinuing their antidepressant treatment.

4.
medRxiv ; 2023 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-37546764

RESUMO

This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) and Deep Learning (DL) techniques to identify and classify documentation of suicidal behaviors in patients with Alzheimer's disease and related dementia (ADRD). We utilized MIMIC-III and MIMIC-IV datasets and identified ADRD patients and subsequently those with suicide ideation using relevant International Classification of Diseases (ICD) codes. We used cosine similarity with ScAN (Suicide Attempt and Ideation Events Dataset) to calculate semantic similarity scores of ScAN with extracted notes from MIMIC for the clinical notes. The notes were sorted based on these scores, and manual review and categorization into eight suicidal behavior categories were performed. The data were further analyzed using conventional ML and DL models, with manual annotation as a reference. The tested classifiers achieved classification results close to human performance with up to 98% precision and 98% recall of suicidal ideation in the ADRD patient population. Our NLP model effectively reproduced human annotation of suicidal ideation within the MIMIC dataset. These results establish a foundation for identifying and categorizing documentation related to suicidal ideation within ADRD population, contributing to the advancement of NLP techniques in healthcare for extracting and classifying clinical concepts, particularly focusing on suicidal ideation among patients with ADRD. Our study showcased the capability of a robust NLP algorithm to accurately identify and classify documentation of suicidal behaviors in ADRD patients.

5.
J Am Med Inform Assoc ; 30(12): 2036-2040, 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-37555837

RESUMO

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.


Assuntos
COVID-19 , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Algoritmos
6.
Inj Prev ; 29(5): 384-388, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37399309

RESUMO

OBJECTIVES: Falls are the leading cause of non-fatal injury among young children. The aim of this study was to identify and quantify the circumstances contributing to medically attended paediatric fall injuries among 0-4 years old. METHODS: Cross-sectional data for falls among kids under 5 years recorded between 2012 and 2016 in the National Electronic Injury Surveillance System was obtained. A sample of 4546 narratives was manually coded for: (1) where the child fell from; (2) what the child fell onto; (3) the activities preceding the fall and (4) how the fall occurred. A natural language processing model was developed and subsequently applied to the remaining uncoded data to yield a set of 91 325 cases coded for what the child fell from, fell onto, the activities preceding the fall, and how the fall occurred. Data were descriptively tabulated by age and disposition. RESULTS: Children most often fell from the bed accounting for one-third (33%) of fall injuries in infants, 13% in toddlers and 12% in preschoolers. Children were more likely to be hospitalised if they fell from another person (7.4% vs 2.6% for all other sources; p<0.01). After adjusting for age, the odds of a child being hospitalised following a fall from another person were 2.1 times higher than falling from other surfaces (95% CI 1.6 to 2.7). CONCLUSIONS: The prevalence of injuries due to falling off the bed, and the elevated risk of serious injury from falling from another person highlights the need for more robust and effective communication to caregivers on fall injury prevention.


Assuntos
Ferimentos e Lesões , Lactente , Humanos , Criança , Pré-Escolar , Recém-Nascido , Estudos Transversais , Prevalência , Ferimentos e Lesões/epidemiologia
7.
J Am Med Inform Assoc ; 30(8): 1418-1428, 2023 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-37178155

RESUMO

OBJECTIVE: This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) techniques to identify and classify documentation of preoperative cannabis use status. MATERIALS AND METHODS: We developed and applied a keyword search strategy to identify documentation of preoperative cannabis use status in clinical documentation within 60 days of surgery. We manually reviewed matching notes to classify each documentation into 8 different categories based on context, time, and certainty of cannabis use documentation. We applied 2 conventional ML and 3 deep learning models against manual annotation. We externally validated our model using the MIMIC-III dataset. RESULTS: The tested classifiers achieved classification results close to human performance with up to 93% and 94% precision and 95% recall of preoperative cannabis use status documentation. External validation showed consistent results with up to 94% precision and recall. DISCUSSION: Our NLP model successfully replicated human annotation of preoperative cannabis use documentation, providing a baseline framework for identifying and classifying documentation of cannabis use. We add to NLP methods applied in healthcare for clinical concept extraction and classification, mainly concerning social determinants of health and substance use. Our systematically developed lexicon provides a comprehensive knowledge-based resource covering a wide range of cannabis-related concepts for future NLP applications. CONCLUSION: We demonstrated that documentation of preoperative cannabis use status could be accurately identified using an NLP algorithm. This approach can be employed to identify comparison groups based on cannabis exposure for growing research efforts aiming to guide cannabis-related clinical practices and policies.


Assuntos
Cannabis , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Algoritmos , Documentação
8.
J Biomed Inform ; 142: 104343, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36935011

RESUMO

Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.


Assuntos
Ciência de Dados , Informática Médica , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Narração
9.
J Clin Med ; 11(4)2022 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-35207214

RESUMO

People living with Alzheimer's disease (AD) and AD-related dementias (ADRDs) are at a higher risk of suicidal behaviors given intersecting risk factors. Previous studies generally only focused on AD, small clinical samples, or grouped all dementia subtypes together, limiting insights for other ADRD subtypes. The objective of this study was to generate evidence related to the relative burden of suicidal behaviors (suicidal ideation and suicide attempt) among people with AD and ADRDs. This retrospective cross-sectional study identified hospitalizations related to suicidal behaviors (suicidal ideation and suicide attempt) for patients with Alzheimer's disease (AD) and AD-related dementias using ICD-10-CM codes from the Nationwide Readmissions Database (NRD). A logistic regression model was estimated to assess associations between AD/ADRD subtype and patient characteristics, and the risk for a suicidal-behavior-related hospitalization and modes of harm were reported. During 2016-2018, there were 12,538 hospitalizations related to suicidal behaviors for people with AD/ADRDs. The overall prevalence of suicidal-behavior-related hospitalizations was lowest for AD (0.8%) and highest for frontotemporal dementia (2.6%). Among hospitalizations for suicide attempts, the most common mode of harm was medications or drugs (89.2% of all attempts), followed by weapons (17.7%). We found that there was a difference in the frequency of suicidal-behavior-related hospitalizations among AD/ADRD hospitalized patients across dementia subtypes.

10.
JMIR Med Inform ; 10(2): e29803, 2022 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-35200154

RESUMO

BACKGROUND: Prediabetes affects 1 in 3 US adults. Most are not receiving evidence-based interventions, so understanding how providers discuss prediabetes with patients will inform how to improve their care. OBJECTIVE: This study aimed to develop a natural language processing (NLP) algorithm using machine learning techniques to identify discussions of prediabetes in narrative documentation. METHODS: We developed and applied a keyword search strategy to identify discussions of prediabetes in clinical documentation for patients with prediabetes. We manually reviewed matching notes to determine which represented actual prediabetes discussions. We applied 7 machine learning models against our manual annotation. RESULTS: Machine learning classifiers were able to achieve classification results that were close to human performance with up to 98% precision and recall to identify prediabetes discussions in clinical documentation. CONCLUSIONS: We demonstrated that prediabetes discussions can be accurately identified using an NLP algorithm. This approach can be used to understand and identify prediabetes management practices in primary care, thereby informing interventions to improve guideline-concordant care.

11.
JAMIA Open ; 5(1): ooac006, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35224458

RESUMO

OBJECTIVE: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. MATERIALS AND METHODS: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. RESULTS: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). DISCUSSION: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. CONCLUSION: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.

12.
Front Public Health ; 9: 697501, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34513783

RESUMO

Introduction: Despite the growing efforts to standardize coding for social determinants of health (SDOH), they are infrequently captured in electronic health records (EHRs). Most SDOH variables are still captured in the unstructured fields (i.e., free-text) of EHRs. In this study we attempt to evaluate a practical text mining approach (i.e., advanced pattern matching techniques) in identifying phrases referring to housing issues, an important SDOH domain affecting value-based healthcare providers, using EHR of a large multispecialty medical group in the New England region, United States. To present how this approach would help the health systems to address the SDOH challenges of their patients we assess the demographic and clinical characteristics of patients with and without housing issues and briefly look into the patterns of healthcare utilization among the study population and for those with and without housing challenges. Methods: We identified five categories of housing issues [i.e., homelessness current (HC), homelessness history (HH), homelessness addressed (HA), housing instability (HI), and building quality (BQ)] and developed several phrases addressing each one through collaboration with SDOH experts, consulting the literature, and reviewing existing coding standards. We developed pattern-matching algorithms (i.e., advanced regular expressions), and then applied them in the selected EHR. We assessed the text mining approach for recall (sensitivity) and precision (positive predictive value) after comparing the identified phrases with manually annotated free-text for different housing issues. Results: The study dataset included EHR structured data for a total of 20,342 patients and 2,564,344 free-text clinical notes. The mean (SD) age in the study population was 75.96 (7.51). Additionally, 58.78% of the cohort were female. BQ and HI were the most frequent housing issues documented in EHR free-text notes and HH was the least frequent one. The regular expression methodology, when compared to manual annotation, had a high level of precision (positive predictive value) at phrase, note, and patient levels (96.36, 95.00, and 94.44%, respectively) across different categories of housing issues, but the recall (sensitivity) rate was relatively low (30.11, 32.20, and 41.46%, respectively). Conclusion: Results of this study can be used to advance the research in this domain, to assess the potential value of EHR's free-text in identifying patients with a high risk of housing issues, to improve patient care and outcomes, and to eventually mitigate socioeconomic disparities across individuals and communities.


Assuntos
Registros Eletrônicos de Saúde , Habitação , Mineração de Dados , Feminino , Humanos , Estudos Retrospectivos , Determinantes Sociais da Saúde , Estados Unidos
13.
J Am Med Inform Assoc ; 28(6): 1275-1283, 2021 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-33674830

RESUMO

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.


Assuntos
COVID-19/diagnóstico , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Aprendizado Profundo , Humanos , Avaliação de Sintomas/métodos
15.
Ann Intern Med ; 174(1): 33-41, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32960645

RESUMO

BACKGROUND: Risk factors for progression of coronavirus disease 2019 (COVID-19) to severe disease or death are underexplored in U.S. cohorts. OBJECTIVE: To determine the factors on hospital admission that are predictive of severe disease or death from COVID-19. DESIGN: Retrospective cohort analysis. SETTING: Five hospitals in the Maryland and Washington, DC, area. PATIENTS: 832 consecutive COVID-19 admissions from 4 March to 24 April 2020, with follow-up through 27 June 2020. MEASUREMENTS: Patient trajectories and outcomes, categorized by using the World Health Organization COVID-19 disease severity scale. Primary outcomes were death and a composite of severe disease or death. RESULTS: Median patient age was 64 years (range, 1 to 108 years); 47% were women, 40% were Black, 16% were Latinx, and 21% were nursing home residents. Among all patients, 131 (16%) died and 694 (83%) were discharged (523 [63%] had mild to moderate disease and 171 [20%] had severe disease). Of deaths, 66 (50%) were nursing home residents. Of 787 patients admitted with mild to moderate disease, 302 (38%) progressed to severe disease or death: 181 (60%) by day 2 and 238 (79%) by day 4. Patients had markedly different probabilities of disease progression on the basis of age, nursing home residence, comorbid conditions, obesity, respiratory symptoms, respiratory rate, fever, absolute lymphocyte count, hypoalbuminemia, troponin level, and C-reactive protein level and the interactions among these factors. Using only factors present on admission, a model to predict in-hospital disease progression had an area under the curve of 0.85, 0.79, and 0.79 at days 2, 4, and 7, respectively. LIMITATION: The study was done in a single health care system. CONCLUSION: A combination of demographic and clinical variables is strongly associated with severe COVID-19 disease or death and their early onset. The COVID-19 Inpatient Risk Calculator (CIRC), using factors present on admission, can inform clinical and resource allocation decisions. PRIMARY FUNDING SOURCE: Hopkins inHealth and COVID-19 Administrative Supplement for the HHS Region 3 Treatment Center from the Office of the Assistant Secretary for Preparedness and Response.


Assuntos
COVID-19/mortalidade , Mortalidade Hospitalar , Hospitalização , Índice de Gravidade de Doença , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Progressão da Doença , Feminino , Humanos , Lactente , Masculino , Pessoa de Meia-Idade , Pandemias , Estudos Retrospectivos , Fatores de Risco , SARS-CoV-2 , Estados Unidos/epidemiologia
16.
Popul Health Manag ; 24(2): 222-230, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-32598228

RESUMO

As the US health care system moves to expand access to and quality of medical care, the importance of addressing patient-level social needs and community-level social determinants of health (SDOH) is increasingly being recognized. This study evaluates individual- and community-level needs of housing (one of the SDOH domains) across the patient population of an academic medical center and explores how the level of housing needs impacts health care utilization. The authors performed a descriptive analysis of housing issues identified in both structured and unstructured (eg, clinical notes) data extracted from the electronic health record (EHR) and compared this to community-level characteristics of patients' neighborhood as measured by the Area Deprivation Index. Multivariate analyses were performed to assess the association between these and other factors on the frequency of service encounters. Among the 1,034,683 study participants, 59,703 (5.8%) had at least 1 housing issue identified in their EHR from structured or unstructured data combined. After adjusting for other factors, patients with housing instability and homelessness had 49% and 34% more encounters with the health care system compared to patients without housing issues (P < 0.00001). Patients living in the most disadvantaged neighborhoods had 55% more encounters with the health care system compared to those living in the most advantaged neighborhoods (P < 0.00001). This data collection approach and findings can inform health care systems aiming to make use of their EHRs and community-level SDOH information to provide a full assessment of patients' social needs and challenges.


Assuntos
Medicare , Determinantes Sociais da Saúde , Idoso , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Aceitação pelo Paciente de Cuidados de Saúde , Características de Residência , Estados Unidos
17.
ArXiv ; 2020 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-32908948

RESUMO

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.

18.
NPJ Digit Med ; 2: 106, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31701020

RESUMO

End-stage liver disease (ESLD) is associated with cognitive impairment ranging from subtle alterations in attention to overt hepatic encephalopathy that resolves after transplant. Natural language processing (NLP) may provide a useful method to assess cognitive status in this population. We identified 81 liver transplant recipients with ESLD (4/2013-2/2018) who sent at least one patient-to-provider electronic message pre-transplant and post-transplant, and matched them 1:1 to "healthy" controls-who had similar disease, but had not been evaluated for liver transplant-by age, gender, race/ethnicity, and liver disease. Messages written by patients pre-transplant and post-transplant and controls was compared across 19 NLP measures using paired Wilcoxon signed-rank tests. While there was no difference overall in word length, patients with Model for End-Stage Liver Disease Score (MELD) ≥ 30 (n = 31) had decreased word length in pre-transplant messages (3.95 [interquartile range (IQR) 3.79, 4.14]) compared to post-transplant (4.13 [3.96, 4.28], p = 0.01) and controls (4.2 [4.0, 4.4], p = 0.01); there was no difference between post-transplant and controls (p = 0.4). Patients with MELD ≥ 30 had fewer 6+ letter words in pre-transplant messages (19.5% [16.4, 25.9] compared to post-transplant (23.4% [20.0, 26.7] p = 0.02) and controls (25.0% [19.2, 29.4]; p = 0.01). Overall, patients had increased sentence length pre-transplant (12.0 [9.8, 13.7]) compared to post-transplant (11.0 [9.2, 13.3]; p = 0.046); the same was seen for MELD ≥ 30 (12.3 [9.8, 13.7] pre-transplant vs. 10.8 [9.6, 13.0] post-transplant; p = 0.050). Application of NLP to patient-generated messages identified language differences-longer sentences with shorter words-that resolved after transplant. NLP may provide opportunities to detect cognitive impairment in ESLD.

19.
JMIR Med Inform ; 7(3): e13802, 2019 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-31376277

RESUMO

BACKGROUND: Most US health care providers have adopted electronic health records (EHRs) that facilitate the uniform collection of clinical information. However, standardized data formats to capture social and behavioral determinants of health (SBDH) in structured EHR fields are still evolving and not adopted widely. Consequently, at the point of care, SBDH data are often documented within unstructured EHR fields that require time-consuming and subjective methods to retrieve. Meanwhile, collecting SBDH data using traditional surveys on a large sample of patients is infeasible for health care providers attempting to rapidly incorporate SBDH data in their population health management efforts. A potential approach to facilitate targeted SBDH data collection is applying information extraction methods to EHR data to prescreen the population for identification of immediate social needs. OBJECTIVE: Our aim was to examine the availability and characteristics of SBDH data captured in the EHR of a multilevel academic health care system that provides both inpatient and outpatient care to patients with varying SBDH across Maryland. METHODS: We measured the availability of selected patient-level SBDH in both structured and unstructured EHR data. We assessed various SBDH including demographics, preferred language, alcohol use, smoking status, social connection and/or isolation, housing issues, financial resource strains, and availability of a home address. EHR's structured data were represented by information collected between January 2003 and June 2018 from 5,401,324 patients. EHR's unstructured data represented information captured for 1,188,202 patients between July 2016 and May 2018 (a shorter time frame because of limited availability of consistent unstructured data). We used text-mining techniques to extract a subset of SBDH factors from EHR's unstructured data. RESULTS: We identified a valid address or zip code for 5.2 million (95.00%) of approximately 5.4 million patients. Ethnicity was captured for 2.7 million (50.00%), whereas race was documented for 4.9 million (90.00%) and a preferred language for 2.7 million (49.00%) patients. Information regarding alcohol use and smoking status was coded for 490,348 (9.08%) and 1,728,749 (32.01%) patients, respectively. Using the International Classification of Diseases-10th Revision diagnoses codes, we identified 35,171 (0.65%) patients with information related to social connection/isolation, 10,433 (0.19%) patients with housing issues, and 3543 (0.07%) patients with income/financial resource strain. Of approximately 1.2 million unique patients with unstructured data, 30,893 (2.60%) had at least one clinical note containing phrases referring to social connection/isolation, 35,646 (3.00%) included housing issues, and 11,882 (1.00%) had mentions of financial resource strain. CONCLUSIONS: Apart from demographics, SBDH data are not regularly collected for patients. Health care providers should assess the availability and characteristics of SBDH data in EHRs. Evaluating the quality of SBDH data can potentially enable health care providers to modify underlying workflows to improve the documentation, collection, and extraction of SBDH data from EHRs.

20.
Proc Conf ; 2015: 117-123, 2015 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-28691123

RESUMO

Restrictive and repetitive behavior (RRB) is a core symptom of autism spectrum disorder (ASD) and are manifest in language. Based on this, we expect children with autism to talk about fewer topics, and more repeatedly, during their conversations. We thus hypothesize a higher semantic overlap ratio between dialogue turns in children with ASD compared to those with typical development (TD). Participants of this study include children ages 4-8, 44 with TD and 25 with ASD without language impairment. We apply several semantic similarity metrics to the children's dialogue turns in semi-structured conversations with examiners. We find that children with ASD have significantly more semantically overlapping turns than children with TD, across different turn intervals. These results support our hypothesis, and could provide a convenient and robust ASD-specific behavioral marker.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...